EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

نویسندگان

چکیده

In the pursuit of achieving ever-increasing accuracy, large and complex neural networks are usually developed. Such models demand high computational resources therefore cannot be deployed on edge devices. It is great interest to build resource-efficient general purpose due their usefulness in several application areas. this work, we strive effectively combine strengths both CNN Transformer propose a new efficient hybrid architecture EdgeNeXt. Specifically EdgeNeXt, introduce split depth-wise transpose attention (STDA) encoder that splits input tensors into multiple channel groups utilizes convolution along with self-attention across dimensions implicitly increase receptive field encode multi-scale features. Our extensive experiments classification, detection segmentation tasks, reveal merits proposed approach, outperforming state-of-the-art methods comparatively lower compute requirements. EdgeNeXt model 1.3M parameters achieves 71.2% top-1 accuracy ImageNet-1K, MobileViT an absolute gain 2.2% 28% reduction FLOPs. Further, our 5.6M 79.4% ImageNet-1K. The code available at https://t.ly/_Vu9 .

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Architecture for Adaptive Mobile Applications

Mobile computing is a relatively new field. While the challenges arising from mobility and the limitations of the portable devices are relatively well understood, there is no consensus yet as to what should be done to address these challenges. A comprehensive solution has to address many different aspects, such as the issue of dynamically changing bandwidth, the power, computational, and other ...

متن کامل

Architecture for Multimodal Mobile Applications

Applications that are used on mobile terminals require a sophisticated input / output interface in order to ease the user application dialogue. Convenient data I/O can be achieved by the introduction of multimodal channels like stylus, speech recognition / synthesis, and gestures. A complex requirement is that the channels have to be applicable in parallel, e.g. tapping on a map and speaking “z...

متن کامل

Flexible Power Electronic Transformer for Power Flow Control Applications

This paper proposes a Flexible Power Electronic Transformer (FPET) for the application in the micro-grids. The low frequency transformer is usually used at the Point of Common Coupling (PCC) to connect the low voltage grid and utility network to each other. The conventional 50Hz transformer results in enhanced low voltage-grid power management system during grid-connected operation. In this pap...

متن کامل

An Architecture for Adaptive Mobile Applications

Mobile applications execute in an environment characterized by scarce and dynamically varying resources. We believe that applications have to adapt dynamically and transparently to the amount of resources available at runtime. To achieve this goal, we use the conventional extension of the clientserver model to a client-proxy-server model. The mobile devices execute the client, which provides th...

متن کامل

A Managed Architecture for Mobile Distributed Applications

Internet-distributed systems are beginning to offer a serious platform for stable, long-lived, flexible applications development. Moreover, such Internet-scale applications are motivating new modes of work and company integration such as teleworking and virtual enterprises. There is an increasing realisation that within such flexible working structures that applications will have to support som...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2023

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-25082-8_1